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Abstract 



In both population genetics and forensic genetics it is important to know how haplo- 
types are distributed in a population. Simulation of population dynamics helps facilitating 
research on the distribution of haplotypes. In forensic genetics, the haplotypes can for 
example consist of lineage markers such as short tandem repeat loci on the Y chromo- 
some (Y-STR). A dominating model for describing population dynamics is the simple, yet 
powerful, Fisher- Wright model. We describe an efficient algorithm for exact forward sim- 
ulation of exact Fisher- Wright populations (and not approximative such as the coalescent 
model) . The efficiency comes from convenient data structures by changing the traditional 
view from individuals to haplotypes. The algorithm is implemented in the open-source R 
package fwsim and is able to simulate very large populations. We focus on a haploid model 
and assume stochastic population size with flexible growth specification, no selection, a 
neutral single step mutation process, and self-reproducing individuals. These assumptions 
make the algorithm ideal for studying lineage markers such as Y-STR. 

Keywords: fwsim, R, k-d tree, forensic genetics, Y-STR, stepwise mutation model, single step 
mutation model. 



Simulation of population dynamics is an important tool when studying genetic traits. In 
both population genetics and forensic genetics it is important to know how haplotypes are 
distributed in a population. In forensic genetics, the haplotypes can for example consist of 
lineage markers such as short tandem repeat loci on the Y chromosome (Y-STR). Simulation 
of population dynamics helps facilitating research on the distribution of haplotypes. A dom- 
inating model for describing population dynamics is the simple, yet powerful, Fisher- Wright 
model (or process) (Fisher 1922, 1930, 1958; Wright 1931; Ewens 2004). In population genet- 
ics, the model also forms the basis for coalescent theory (Kingman 1982; Hudson 2001; Hein, 
Schierup, and Wiuf 2005). 

Because the Fisher- Wright model is widely used in population genetics, efficient simulation 
algorithms and tools are needed. In this paper we describe the model implemented in the R (R 
Development Core Team 2012) package fwsim (Andersen and Eriksen 2012), which provides 
an efficient tool for simulating certain kinds of Fisher- Wright populations. The simulation 
scheme described in this paper is exact (from the Fisher- Wright model) and not approximative 
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like the simulation scheme from the coalescent model (Kingman 1982; Hudson 2001; Hein et al. 
2005). 

(Ewens 2004) is a good reference on different models in population genetics as it explains 
several models and also gives theoretical results. 

First some nomenclature must be introduced. Let a locus (loci in plural) be a specific location 
on the chromosome. The content of a locus is called an allele, which consists of DNA sequences. 
Here, we assume that the alleles are short tandem repeats (STRs) (Butler 2005) with values 
in Z (in genes, an allele could also just be either of two states, A or B, say). A haplotype is 
a ordered collection of alleles at loci that are transmitted together. 

We focus on a haploid model, where each individual is a gamete with a haplotype consisting 
of r loci. Hence, a haplotype can in this context be thought of as a vector in 717 . It may 
for example be an Y-STR haplotype. We assume no selection and the individuals are self- 
reproducing. 

First, the traditional Fisher- Wright model without mutations is described in order to introduce 
the notation and to make it possible to compare it with our model. 

Throughout this paper, whenever there is a mutation process, we assume it to be a neutral 
(in the sense of no selection) single step mutation process with infinitely many possible allelic 
states. This model was introduced by (Ohta and Kimura 1973) and some mathematical 
properties were recently discussed in (Caliebe, Jochens, Krawczak, and Rosier 2010). 

1.1. Fisher- Wright model without mutation 

Traditionally, a simple Fisher- Wright model, for example as formulated by (Ewens 2004), 
assumes constant population size and no mutations. A Fisher- Wright model is often char- 
acterised by a binomial sampling scheme focusing on individuals (or a multinomial sampling 
scheme focusing on the entire population), such that a new generation of children is sampled 
by letting each child choose its parent (and thus its haplotype) uniformly at random. 

Because our interest is aimed at the sampling of populations and not at the genealogy, the 
focus is now changed from individuals to haplotypes, where identical haplotypes are treated 
similarly, as we are not interested in the genealogical tree itself, but only in the haplotypes 
and their counts in the resulting population (and possibly in the intermediate populations, 
too). 

Let N be the constant, known population size and H the set of haplotypes. Denote by rij(x) 
the number of haplotypes in the i'th generation of haplotype x £ H and Zi+\(x) the number 
of children from haplotype x G H in generation i + 1. Because there are no mutations, we 
have that m+i(x) = Zi + \{x). 

The simple Fisher- Wright model arises by assuming that P ({rii + i(x)} x& H \ {ni(x)}x£H) is 
given by 




(1) 



A property of the multinomial distribution is that 



E [n i+ i(x) | rn(x)] 



rii(x) 
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as expected. 

We note that the process is a Markov chain with \H\ absorbing states, one for each haplotype. 



As mentioned in Section 1.1, the model is formulated on the basis of haplotypes instead of 
individuals, because it is much more efficient when we are interested in the resulting population 
after a number of generations rather than the genealogy. 

The notation from Section 1.1 is adopted, such that Hi is the set of haplotypes in the i'th 
generation (Hi depends on i due to mutations, which will be introduced below), ni(x) is the 
number of haplotypes in the i'th generation of haplotype x 6 Hi, and Zi + \(x) the number of 
children from haplotype x G H, L . Now let N{ = YlxeH n i( x ) De the population size in the i'th 
generation (instead of a constant population size N as in the simple Fisher- Wright model in 
Equation 1). 

Our model is then a specification of how 

{z i+1 (x)} xeHi | {ni(x)} xeHi 

is distributed, that is, how the haplotypes in the next generation are conditionally distributed 
given the previous generation. 

Two important features of our model is, that it assumes stochastic population size - which 
we believe is a more realistic model - and allows flexible population growth specification. We 
believe that the Fisher- Wright model that will be introduced below with stochastic population 
size also incorporating flexible population growth has not yet been defined like we do in the 
following. First the modelling of the population size and growth will be described. Afterwards 
the mutational model will be explained. 

2.1. Population size and growth 

Let Nq be the known initial population size. Note that in the traditional Fisher- Wright model, 
this is assumed to be a constant. 

Then we assume that 



for on > (oii > 1 gives growth and < a% < 1 gives decline). For example, if aj = a for all 
i, then 



2. Model 



Poisson(«jAj_i) 



(2) 



E[Ni] = a 1 N 



that is exponential population growth 



One could also choose 




f3, for % < t, 
a, else, 



yielding 




for i <t, 
else, 



4 



Efficient Simulation of Fisher-Wright Populations 



which for example can be used to get exponential growth up to generation t and afterwards 
an expected constant population size by setting a = 1. 

A possibly more realistic example is logistic population growth, which can be obtained by 
specifying a maximum population size N max , a > 1, and then setting 



as the growth rates. A closed form expression for E[JVj] in this case seems difficult to obtain. 

One could alternatively also create a (possibly decreasing) rate a% = f(i) for some function 
/. Hence, the specification of growth is rather flexible. 

2.2. Number of children 

As mentioned previously, the conditional distribution {zi + \(x)} x ^Hi \ i x i( x )}x£Hi must be 
specified. We assume that the number of children Z{+i(xq) of a certain haplotype xq E Hi 
is conditionally independent of the number of children of other haplotypes, given the entire 
previous generation {xj(x)} xg # 4 . Thus, only the marginal distribution Zj + i(xo) | {xi(x)} x< =Hi 
must be specified. 

For each haplotype x$ E Hi in the i'th generation occuring rij(xo) times, we then assume that 
the number of children Zi+i(xo) is distributed independently of other haplotypes as 



As can be seen, Zi+\{xo) actually only depends on nj(xo) an d not on the number of all the 
other haplotypes. 

It then follows that Ni + \ = Ei-eH, z »+i( x ) (the sum of the number of haplotypes in the 
(i + l)'th generation) conditionally on {ni(x)} x eHi follows a Poisson(aj + iiVj) distribution, 
and that 



as expected, which is also true for the simple Fisher- Wright model in Equation 1. 
2.3. Mutation model 

As mentioned in the introduction, we assume a neutral (in the sense of no selection) single 
step mutation process on Z. Instead of just one locus we extend it to r loci, where mutations 
on loci happen independently. We assume per locus and direction mutation rates. Let 



oti = a — 




Zi+i(x ) | {ni(x)} xeHi ~ Poisson(ai + inj(x )). 



(3) 



Zi+l(xo) | {ni(x)} xe Hi, Ni+l ~ Binomial N i+ i 




Q = {-l,0,iy = {-1,0,1} x ••• x {-1,0,1}. 

' 



r factors 



where x denotes the Cartesian product, be the lattice of possible mutations. Let 



Sj q = ~l 



i \ I 1 - Sj - ujj 

Uj 



(4) 



else 



Mikkel Meyer Andersen, Poul Svante Eriksen 



5 



denote the mutation probabilities for the j'th locus and 

r 

p(q) = Y[pj(Qj) 

for a mutation configuration q = (gi, q%, . . . , q r ) G Q from the fact that mutations are assumed 
to happen independently across loci. 

Let 



Ci+i= [J {x!+q} 



qeQ 

be all possible candidate haplotypes for the {i + l)'th generation. 
Our model with mutations is then 



71; 



-l(yo) I {ni(x)}xeHi ~ Poisson j a i+1 ^ p(q)ni(y - q) J for all y € C i+1 , (5) 

qeQ 



resulting in A^ +1 | Ni ~ Poisson(aj + iA^) as assumed in Equation 2 because 

^2 a>i+i^2p(q)ni(y - q) = a i+ i^2p{q) ^ n;(y - q) 
2/oSCi+i qeQ qeQ yoeC i+1 

qeQ 
= a i+ iNi. 

Another way to formulate an equivalent model, which will be used in the implementation, is 
as follows. Let 771^+1(2;, x + q) denote the number of mutants mutating from x to x + q in the 
transition from the i'th generation to the (i + l)'th generation and 

M i+ i(x) = {m i+1 (x,x + q)} q eQ 

the number of mutants for all possible configurations in Q. 

Then assume that {Mi + i(x)} X £H i are conditionally independent given {zj + i(x)} xg //; i , thus 
only the marginal distribution is to be specified. If we model this conditional marginal dis- 
tribution as 

M i+1 (x ) I {z i+1 (x)} xeHi ~ Multinomial (z i+1 (x ), {p(q)} q eQ) , (6) 

and set 

n i+1 (x) = ^ (x ~ q,x), 
qeQ 

we get a model equivalent to the one specified in Equation 5. 
2.4. Absorbing state 

The model in Equation 5 (or the equivalent model in Equation 6) has positive probability 
of dying out, because the Poisson distribution has probability mass in for every parameter 



6 



Efficient Simulation of Fisher-Wright Populations 



value. This means that population size is an absorbing state. Also note that this absorbing 
state is independent of the mutation rate, as the population size is independent of the mutation 
rate. 



In this section, some implementation details are discussed. As already mentioned, the de- 
scribed model is implemented in the R (R Development Core Team 2012) package fwsim 
(Andersen and Eriksen 2012) using the C programming language. The package fwsim is 
released under the BSD license. 

First some implementation details are explained and then a few examples are given. 

3.1. Haplotype container 

Each generation consists of a number of haplotypes, each with a count of the number of times 
it is present in the generation. These haplotypes are saved in a data container. This data 
container is a so-called k-d tree (Bentley 1975) (this abbrivation stands for k dimensional 
tree), which is a generalisation of a binary search tree. Whereas binary search trees are for 
one dimensional points (numbers), k-d trees are for k dimensional points (vectors). Like 
binary search trees, the time complexity for insertion and searching in a k-d tree is O(logre) 
for a tree with n nodes. 

For each generation, a new k-d tree is created and nodes inserted or updated as the haplotypes 
are evolved one at a time. A node in the tree contains both the point (haplotype) and 
additional information, which here is only a count (of the number of individuals having this 
particular haplotype). 

The implementation of k-d trees is based on http : //code . google . com/kdtree released under 
the BSD license, but has been heavily modified for example by changing some data structures 
and adding node searching and updating functionality. 

3.2. Mutation model 

In this section, the implementation of the mutation model defined in Section 2.3 is described. 

The mutation model is implemented by dividing the number of children Equation 3 into 
categories depending on the number of times they mutate. There are r + 1 categories, namely 
for d = 0, 1, . . . , r mutations on the r loci. Because this is the stepwise mutation model, only 
one mutation can happen per locus at a time. 

As before, z,+i(x) is the number of children from haplotype x € H. Let zf +l (x) be the number 
of children in the cfth category such that Zj + i(x) = Yld=o z i+i( x )- ^ we assume that 



3. Implementation 




(7) 



where rjd is the probability for d mutations with Xld 7 ?^ = 1j tb en Equation 3 still holds. 
Naturally, each of the zf, 1 (x) children have to choose their d mutations independently of the 
others. 
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To see the analogue between rrii + i(x,x + q) and Zi + \{x), first let 

Qd = \q^Q \\q\\i = dX, 

where || • ||i denotes the L 1 norm such that \\q\\\ = \\(qi, q2, ■ ■ ■ , Qr)\\l = Y7j=i That is, 
Qd is the mutation configurations resulting in precisely d mutations. Then 



z i+ 



q&Qd 



First the probability of not mutating is treated. Let fj,j = 5j +ojj be the mutation rate for the 
j'th locus for j = 1,2, ... , r with Sj denoting the downwards mutation rate and Uj denoting 
the upwards mutation rate. Then 

r 

is the probability of not mutating. 

Now the model of choosing the mutating loci is discussed. There are (^) ways to choose 
the d loci that should mutate. Each of these loci configurations has 2 d possible mutation 
configurations (the size of the cartesian product { — 1, This means that there is a total of 
2 d (2) possible ways to mutate d times. The probability for mutating to a specific haplotype 
is determined by the d locus specific upwards and downwards mutation rates. 

For mutation category d, let 



S d = {sC{l,2,...,r} | \s\ = d] 



be a so-called simple table with (^) rows. Then the probability that it is exactly the loci 
s € Sd that should mutate, is 

P( s ) =Y[h II ( 1- ^')> 
where s c = {1, 2, . . . , r, } \ s. Further, the probability of exactly d mutations is 

s£S d 

Hence, Equation 7 is now fully specified. To decide the direction of the mutations, let 

Ed = {(s,q) | a € S d ,q:s^ {-1,1}} 

be a so-called extended table with 2 d ( d ) rows. The function q maps a locus to a mutation 
direction. Then each row e = (s, q) S Ed and has probability 

p(e) = I[pMj)) U0--Pi(g(J))), 

j£s j(z s C 
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where Pj(q(j)) is denned in Equation 4. We still have that the sum of the rows in the extended 
table is 

Then for generation i, haplotype x, and mutation category d, we assume that 



Both the simple and extended table for mutation category d = 1,2, ... ,r {d = does not 
require this step) are created before the actual simulation starts as the probabilities are 
constant during the evolution. They are constant because the mutation rates are assumed 
constant. This is what is done in the fwsim package for all mutation categories, although this 
may be changed in future releases if the following theoretical limitations turn out to occur in 
practise, too. 

Note that 2 d Q, the size of the extended table, is exponentially growing and may become 
really large for even relatively small r and that the corresponding extended tables take some 
time to generate. For example, for r = 16 and d = 11 the size of the extended table is 
8,945,664 (the maximal for that choice of r), however, it is still possible to be created and 
used for simulation. Once the tables are created, the simulations run rather smoothly because 
they are just stored in memory. 

On the other hand, the mutation rate would normally be so low that mutations in the cate- 
gories for even small d may rarely or never happen depending on the population size, which 
means that these mutation categories are probably better delt with manually as follows. Re- 
call that % only depends on the simple table, which is small compared to the extended table - 
namely a factor of 2 d smaller - and so the simple table can still be calculated to a rather large 
r. When the simple tables are generated, then draw n from Poisson(aj+ir/^nj(x)) and mutate 
each of the haplotypes manually one at a time by choosing the d loci and their directions 
randomly according to their probabilities. 



The simulation method described above is developed with efficiency in mind. To illustrate 
that efficiency is achieved, the computation time for different parameters have been investi- 
gated using a laptop with a 2.40GHz Intel(R) Core(TM) i5 CPU (model M 520). For these 
computations, fwsim (Andersen and Eriksen 2012) version 0.2-5 was used. 

In Figure 1, the absolute computation time for simulating a population with a varying number 
of loci is shown. In Figure 1, the computation time for simulating a population with a varying 
initial population size is shown. Both figures show that the algorithm is quite fast. 

In Table 1, the computation time using fwsim compared to a naive implementation (focusing 
on individuals rather than haplotypes) of simulating under a Fisher- Wright model is shown. 
As seen, fwsim is magnitudes faster than a naive implementation: On average, fwsim is 
almost 2,000 times faster when simulating a population with an initial size of 5,000, no ex- 
pected growth (by using the growth parameter a = 1), and a mutation rate of 0.003 in 100 
generations than the naive implementation (focusing on individuals rather than haplotypes). 
Further, the memory consumption is smaller for fwsim as it uses haplotypes instead of in- 




4. Computation time 



Mikkel Meyer Andersen, Poul Svante Eriksen 



9 



o 

o 

o 



o 



o 



o 

o 

o 

I I I I 

2 4 6 8 10 

Number of loci 

Figure 1: The computation time depending on the number of loci. The initial population size 
is set to 10,000, the number of generations to 500, the mutation rate to 0.003, the growth 
parameter to 1 (meaning constant expected population size). The computation time for each 
number of loci is the median computation time of 10 simulations. 

dividuals, which means that it is possible to simulate much larger populations than with a 
naive implementation. 

5. Examples 

In this section, some examples are presented. Please refer to ?f wsim in R for more information 
about usage of the package fwsim. These examples were made using version 0.2-5 of fwsim 
(Andersen and Eriksen 2012). 

5.1. Simple usage 

Lauching an R session and typing the code below will show a short example of the model 
implemented in the package fwsim (k is the number of individuals in the initial population, g 
is the number of generations to evolve, r number of loci, mu mutation rate per loci, alpha is 
the population size growth rate and trace is whether to display trace information): 



o 



o 
O 



in 

o 
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70000 



"i r 
90000 



Size of initial population 



Figure 2: The computation time depending on the initial population size. The number of 
loci is set to 5, the number of generations to 500, the mutation rate to 0.003, the growth 
parameter to 1 (meaning constant expected population size) . The computation time for each 
number of loci is the median computation time of 10 simulations. 

library ( "fwsim ") 
set. seed (1) 

pop <- fwsim(k = 10000, g = 1000, r = 3, mu = 0.003, 
alpha = 1.001, trace = TRUE) 

To obtain a contingency table of the first two loci, use the following: 

sum (pop$haplotypes$N) 

[1] 27672 

xtabs(N ~ Locusl + Locus2, pop$haplotypes) 
Locus2 

Locusl -5 -4 -3 -2 -1 1 2 3 4 5 6 
-6 000052000000 
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k 


9 


M 


Speed-up 


1,000 


100 


0.001 


145.9 


1,000 


100 


0.003 


127.2 


1,000 


200 


0.001 


307.9 


1,000 


200 


0.003 


372.5 


5,000 


100 


0.001 


2,972.1 


5,000 


100 


0.003 


1,957.0 


5,000 


200 


0.001 


6,848.4 


5,000 


200 


0.003 


4,887.1 



Table 1: A comparison of the computation time for fwsim and a naive implementation (fo- 
cusing on individuals rather than haplotypes). A growth parameter a = 1 is used meaning no 
expected population growth, k is the initial population size, g is the number of generations 
to evolve, and n. is the mutation rate. 10 replications for each parameter combination (cor- 
responding to a row in the table) were performed. The speed-up column is the computation 
time for the naive implementation divided by the computation time for fwsim. This means 
that fwsim on average is roughly 2,000 times faster to simulate a population with an initial 
size of 5,000 and a mutation rate of 0.003 in 100 generations than the naive implementation. 
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This table is plotted in Figure 3. A slight drift from the initial (0, 0) has occured. 

We can also see the 10 most frequent haplotypes compared to the initial (0, 0, 0) haplotype: 

pop$haplotypes [order (pop$haplotypes$N , decreasing = TRUE) [1 : 10] , ] 
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Figure 3: A contour plot of the contingency table of the first two loci. A slight drift from the 
initial (0, 0) has occured. 

pop$haplotypes [which (apply (apply (pop$haplotypes [, 1:3], 1, abs) , 
2, sum) == 0) , ] 

Locusl Locus2 Locus3 N 
280 255 

In Figure 4, the actual population sizes are compared to expected population sizes. This 
figure was made with following code: 

plot(pop$sizes, type = "1", xlah = "Generation" , 

ylab = "Population size", lty = 1) 
lines (pop$expected. sizes, lty = 2) 

legend C "topi eft" , legend = c (" Actual" , "Expected ") , lty = 1:2) 



5.2. Genetic drift of alleles 



To illustrate how genetic drift in terms of changed allele frequencies occurs, the allele fre- 
quencies after a different number of generations are recorded. The fwsim package also has 
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Figure 4: The actual population sizes compared to expected population sizes 



the possibility of saving the intermediate populations, which is used to show how allele fre- 
quencies change during the evolution. Thus, genetic drift can be investigated as follows (k is 
the number of individuals in the initial population, alim is the limit of which alleles to plot 
and gs is which generations to sample allele frequencies from): 

library ( "fwsim ") 
set. seed (1) 
alim <- 2 
k <- 100000000 
g <- 10000 

gs <- seq(100, g - 1, by = 100) 
pop <- fwsim(g = g, k = k, r = 1, alpha = 1, 
mu = 0.003, gs = gs, trace = FALSE) 

interhapfreq <- lapply(pop$intermediate.haplotypes[gs], function (hap) { 
tab <- prop. table (xtabs(N ~ Locusl, hap)) 

as. vector (tab [which(abs (as. numeric (names (tab))) <= alim)]) 

}) 
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freq <- data. frame (do. call ("rbind" , interhapfreq) ) 
colnames(freq) <- (-alim) : alim 

plot(gs, freq[, alim+1] , type = "1", 
xlab = "Number of generations" , 
ylab = "Frequency" , ylim = range (freq)) 

for (a in 1 : alim) { 

11 <- (alim+l)-a 

12 <- (alim+1) +a 

lines(gs, freq[, il] , type = "1" , lty = a + 1) 
lines(gs, freq[, 12], type = "1" , lty = a + 1) 

} 

others <- l-apply(freq, 1, sum) 

lines (gs, others, type = "1", lty = alim + 2) 

legend ("topright " , 

legend = c (paste ("Allele", c(0, paste("+/-" , l:alim))) , 

"Other alleles"), 
lty = 1 : (alim+2)) 

Note that we only simulate one locus and set the population size quite large to get the 
asymptotic behaviour. The resulting plot can be seen in Figure 5. 

5.3. Genetic drift of alleles depending on mutation rate 

To illustrate how genetic drift in terms of changed allele frequencies for the allele occurs 
depending on the mutation rate, the allele frequencies after a different number of generations 
are recorded for populations with different mutation rates. Thus, genetic drift depending on 
mutation rate may be investigated as follows (k is the number of individuals in the initial 
population and gs is which generations to sample allele frequencies from): 

1 i brary ("fwsim") 

mus <- c (0.001, 0.002, 0.003) 

k <- 100000000 

g <- 10000 

gs <- seq(100, g - 1, by = 100) 
set . seed(l) 

freqs <- lapply(mus , function(mu) { 

pop <- fwsim(g = g, k = k, r = 1, alpha = 1, mu = mu, save.gs = gs, 

trace = FALSE) 
sapply (pop$intermediate . haplotypes [gs] , 

function(hap) hap$N[which(hap[, 1] == 0)] / sum(hap)) 

}) 
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Figure 5: Simulated genetic drift using an initial population of size 100,000,000, a growth of 
1 (meaning no expected growth), and a mutation rate of 0.003. 



plot(gs, freqs[[l]], type = "1", 

xlab = "Number of generations" , ylab = "Frequency for allele 0" , 
ylim = range (unlist (lapply (freqs , range))) , Ity = 1) 

for (i in 2 : length (mus)) lines(gs, freqs [ [i] ] , type = "1", Ity = i) 

legend("topright" , legend = paste ("mu = ", mus, sep = ""), 
Ity = 1 : length (mus) ) 



Note that we only simulate one locus and set the population size quite large to get the 
asymptotic behaviour. The resulting plot can be seen in Figure 6. 
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Figure 6: Simulated genetic drift using a population size of 100,000,000 and a growth of 1 
(meaning no expected growth). 
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